include_graphics("pic.png")

The University of California is a 10-campus research institute. It is one of three public higher education systems in the state and the only one that has large graduate and research programs. The institution prides itself on its diversity. The diversity of its student body and faculty. The diversity of the programs offered. The diversity of higher educational experience. Our questions are: does this diversity represent California? Are the UC schools as diverse as our communities? What are some disparities and what is a possible explanation for these disparities?

To answer these questions, we began looking at data from the UC’s PhD programs. This is where diversity and representation come to a head. The pool of graduate students is limited to those who have received a bachelor’s degree. These students are also handpicked by faculty rather than admitted based only on test or other milestone completion. From PhD students, however, comes a large portion of academic research. A doctoral degree is required to continue in academia. Our future tenure track professors come from PhD graduates, who will then go on to teach undergraduates and select their own graduate students.

For this project, we will look at racial, ethnic, gender, and residency and citizenship data over 4 to 10 years from the UC system’s PhD graduates. We will compare this data with United States Census data and reported faculty data. Due to using 3 different data sets, one of the main limitations we have is the mismatch of variables and methods of surveying. Additionally, the quality of data can be quite different. For example, in the PhD graduate data there are no double counts of students across racial categories. However, the US Census does allow for double counts. We have done our best with the datasets available to present the data as clearly as possible.

Lastly, as way of clarification, in the PhD graduate data, there is a category called Non-resident/Foreign. This is the category for students who, when they were admitted, were not residents of California (i.e., out of state students) or are from another country. Therefore, the racial data only includes in-state students. Despite many of these students being able get residency during their program and thus at graduation would be a resident, the data is the information given to the school at admittance and then only reports it once the student graduates.

gradsum <- read_excel("UC Info.xlsx") %>% arrange(County)
countyinfo <- read_excel("County Info.xlsx") %>% arrange(County)
statedemo <- read_excel("State Data.xlsx")
knitr::opts_chunk$set(warning = FALSE, message = FALSE)  
USA <- getData("GADM", country = "usa", level = 2)
mypal <- colorFactor(palette = c("#FEB2E0", "#72CDF4", "#FFD200", "#FFE552", "#FF6E1B","#00A3AD", "#FF8F28","#005581", "#E44C9A", "#00778B"), domain = countyinfo$County)
gradsum$`Native American & Hawaiian` <- percent(gradsum$`Native American & Hawaiian`, accuracy = 0.01)
gradsum$White <- percent(gradsum$White, accuracy = 0.1)
gradsum$Asian <- percent(gradsum$Asian, accuracy = 0.1)
gradsum$Black <- percent(gradsum$Black, accuracy = 0.1)
gradsum$Hispanic <- percent(gradsum$Hispanic, accuracy = 0.01)
gradsum$`Non-Res./Foreign` <- percent(gradsum$`Non-Res./Foreign`, accuracy = 0.1)
gradsum$Female <- percent(gradsum$Female, accuracy = 0.1)
countyinfo$White <- percent(countyinfo$White, accuracy = 1.0)
countyinfo$Asian <- percent(countyinfo$Asian, accuracy = 1.0)
countyinfo$Hispanic <- percent(countyinfo$Hispanic, accuracy = 1.0)
countyinfo$Black <- percent(countyinfo$Black, accuracy = 1.0)
countyinfo$`Native American & Hawaiian` <- percent(countyinfo$`Native American & Hawaiian`, accuracy = 1.0)
countyinfo$`Female %`<- percent(countyinfo$`Female %`, accuracy = 0.1)
countyinfo$`Not Citizen` <- percent(countyinfo$`Not Citizen`, accuracy = 0.1)
statedemo$White <- percent(statedemo$White, accuracy = 1.0)
statedemo$Asian <- percent(statedemo$Asian, accuracy = 1.0)
statedemo$Hispanic <- percent(statedemo$Hispanic, accuracy = 1.0)
statedemo$Black <- percent(statedemo$Black, accuracy = 1.0)
statedemo$`Native American & Hawaiian` <- percent(statedemo$`Native American & Hawaiian`, accuracy = 1.0)
statedemo$`Female %`<- percent(statedemo$`Female %`, accuracy = 0.1)
statedemo$`Not Citizen` <- percent(statedemo$`Not Citizen`, accuracy = 0.1)
cali_counties <- c(
  'Alameda','San Francisco','Yolo','Merced','Santa Cruz','Santa Barbara',
  'Los Angeles','Riverside','Orange','San Diego'
  )
county_data <- USA[
  (USA$NAME_2 %in% cali_counties) & (USA$NAME_1 == 'California'),
]

iconSet <- awesomeIconList(
                          stateicon = makeAwesomeIcon(
                                          icon = "globe", library = "ion", markerColor = "darkblue", iconColor = "blue"), 
                          school = makeAwesomeIcon(
                                          icon = "fa-duotone fa-graduation-cap", library = "fa", markerColor = "lightblue", iconColor = "gray" ))
gradsum %>%
  leaflet() %>% 
    addProviderTiles(providers$CartoDB.Positron , group = "Grey") %>%
    setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
    addPolygons(
      data = county_data,
      stroke = FALSE, smoothFactor = 0.2, fillOpacity = 0.3,
      color = mypal(county_data$NAME_2),
      popup = paste0(
        "County: ",countyinfo$County,
        "<br>", "Asian: ", countyinfo$Asian,
        "<br>", "Black: ", countyinfo$Black,
        "<br>", "Latinx: ", countyinfo$Hispanic,
        "<br>", "Native American & Hawaiian: ", countyinfo$`Native American & Hawaiian`,
        "<br>", "White: ", countyinfo$White, 
        "<br>", "Non-citizen: ", countyinfo$`Not Citizen`, 
        "<br>", "Female: ", countyinfo$`Female %`)
    ) %>%
  addAwesomeMarkers(
    label = gradsum$School,
    icon = iconSet$school,
    popup = paste0(
      "PhD Graduates:", 
      "<br>", "Asian:", gradsum$Asian, 
      "<br>","Black: ", gradsum$Black, 
      "<br>", "Latinx:", gradsum$Hispanic, 
      "<br>","Native American & Hawaiian:", gradsum$`Native American & Hawaiian`, 
      "<br>", "White: ", gradsum$White, 
      "<br>", "Non-resident/Foreigner: ", gradsum$`Non-Res./Foreign`, 
      "<br>", "Female: ", gradsum$Female )
    ) %>%
  addAwesomeMarkers(
    label = "State Demographics",
    lat = 38.5816,
    lng = -121.4944,
    icon = iconSet$stateicon,
    popup = paste0(
            "Asian: ", statedemo$Asian,
            "<br>", "Black: ", statedemo$Black,
            "<br>", "Latinx: ", statedemo$Hispanic,
            "<br>", "Native American & Hawaiian: ", statedemo$`Native American & Hawaiian`,
            "<br>", "White: ", statedemo$White,
            "<br>", "Non-citizen: ", statedemo$`Not Citizen`,
            "<br>", "Female: ", statedemo$`Female %`)
  )

Above is a map of the of the 10 University of California school’s PhD graduates from 2018-2021 with demographic data from each that can be viewed with a click on the marker for the school. Additionally, the counties each school is situated in have been marked, so that with another click, the county demographics will appear. The state data is available on the dark blue marker on the capital, Sacramento. Altogether, we have a simple snapshot of each school’s diversity and representation of the community it is in and, because students will often move within the state for education, the state-level population data.

Some quick information we can glean from the map: -Native American & Hawaiian students are underrepresented at every school.

-Black students are underrepresented at every school.

-Latinx students are underrepresented at every school.

-Female students are underrepresented at 9 campuses by an average of 7.6% but overrepresented at 1 campus by 7.4%.

The picture for White and Asian students is a little more complicated. We will explore race in-depth below. We also presented information about non-citizen county and state demographics in contrast with UC non-resident/foreign information. These numbers are not equivalent, and we are unable to tease apart the non-resident students, but nevertheless, are an interesting comparison. California has a high non-citizen population but whether this is the population represented in non-resident/foreign numbers we do not know. Later, we will dig deep into the gender disparity across different campuses and among faculty.

knitr::opts_chunk$set(echo = TRUE)
allyear<-read_excel("UC'S PHD LEVEL 17 CIP18-21.xlsx")
byrace1<-allyear%>%
  filter(CIPcode!="'99'")
A1<- data.frame(CIPcode=byrace1["CIPcode"], region="Asian",total=0,sex="male",a=byrace1["Asian men"], a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
A2<- data.frame(CIPcode=byrace1["CIPcode"], region="Asian",total=0,sex="female",a=byrace1["Asian women"], a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
colnames(A1)[5]="count"
colnames(A2)[5]="count"

His1<- data.frame(CIPcode=byrace1["CIPcode"],region="Hispanic",total=0,sex="male",a=byrace1["Hispanic or Latino men"], a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
His2<- data.frame(CIPcode=byrace1["CIPcode"],region="Hispanic",total=0,sex="female",a=byrace1["Hispanic or Latino women"], a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
colnames(His1)[5]="count"
colnames(His2)[5]="count"

W1<- data.frame(CIPcode=byrace1["CIPcode"],region="white",total=0,sex="male",a=byrace1["White men"], a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
W2<- data.frame(CIPcode=byrace1["CIPcode"],region="white",total=0,sex="female",a=byrace1["White women"] ,a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
colnames(W1)[5]="count"
colnames(W2)[5]="count"

B1<- data.frame(CIPcode=byrace1["CIPcode"],region="black",total=0,sex="male",a=byrace1["Black or African American men"] ,a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
B2<- data.frame(CIPcode=byrace1["CIPcode"],region="black",total=0,sex="female",a=byrace1["Black or African American women"],a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
colnames(B1)[5]="count"
colnames(B2)[5]="count"

totalrace1<-rbind(A1,A2,His1,His2,W1,W2,B1,B2)
newdf<-aggregate(cbind(count,Grand.total.men,Grand.total.women,Grand.total)~institution+sex+region, data=totalrace1, sum)

According to the chart below, you can see that in most UC schools men are offered more PhD than women in almost every ethnicity. If we only focus on gender in each ethnicity, UC Davis, UC Merced, and UCSF are the most representative schools. If we only focus on gender in each ethnicity, UC Berkeley and UCSD are schools that are not representative. We also list out the total PhD that is given to nonresident students. Similarly, male tends to be given more doctoral degrees than women. Then, if we look at Ethnicity alone, we can see a similar pattern in almost every UC schools, either male or female.

ggplot(newdf,aes(x=region,y=count,fill=sex)) +geom_bar(position="dodge", stat="identity")+facet_wrap(~institution)+labs(x="Ethnicity",y= "Total")+coord_flip()+theme_classic()+ggtitle("Grand Total for Each UCschools Shown by Enthnicities and Sex")

totalracegood=newdf %>% 
  filter(institution=="UCDavis"|institution=="UCSF"|institution=="UCMerced")
ggplot(totalracegood,aes(x=region,y=count,fill=sex)) +geom_bar(position="dodge", stat="identity",width=0.7)+facet_grid(~institution)+labs(x="Ethnicity",y= "Total")+coord_flip() +theme_classic()+ggtitle("Good Representative Schools in Gender")

totalracebad=newdf %>% 
  filter(institution=="UCBerkeley"|institution=="UCSD")
ggplot(totalracebad,aes(x=region,y=count,fill=sex)) +geom_bar(position="dodge", stat="identity",width=0.7)+facet_grid(~institution)+labs(x="Ethnicity",y= "Total")+coord_flip() +theme_classic()+ggtitle("Bad Representative Schools in Gender")

N1<- data.frame(CIPcode=byrace1["CIPcode"],region="nonresident",total=0,sex="male",a=byrace1["Nonresident alien men"], a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
N2<- data.frame(CIPcode=byrace1["CIPcode"],region="nonresident",total=0,sex="female",a=byrace1["Nonresident alien women"], a=byrace1["institution"],a=byrace1["Grand total men"],a=byrace1["Grand total women"],a=byrace1["Grand total"])
colnames(N1)[5]="count"
colnames(N2)[5]="count"
totalrace2=rbind(N1,N2)
international<-aggregate(cbind(count,Grand.total.men,Grand.total.women)~institution+sex+region, data=totalrace2, sum)
ggplot(international,aes(x=region,y=count,fill=sex)) +geom_bar(position="dodge", stat="identity",width=0.7)+facet_wrap(~institution)+labs(x="Ethnicity",y= "Total")+coord_flip()+theme_classic()+ggtitle("Grand Total of nonresident students for Each UCschools")

totalrace3=rbind(A1,A2,His1,His2,W1,W2,B1,B2,N1,N2)
newdf<-aggregate(cbind(count,Grand.total.men,Grand.total.women,Grand.total)~institution+sex+region, data=totalrace1, sum)
newdf1<-aggregate(cbind(count,Grand.total.men,Grand.total.women,Grand.total)~institution+sex+region, data=totalrace3, sum)
aa_women<-newdf1%>%
  filter(sex=="female")
aa_men<-newdf1%>%
  filter(sex=="male")
aa_men$ratio <- with(aa_men,count/Grand.total.men)*100
aa_women$ratio <- with(aa_women,count/Grand.total.women)*100
proportionbygender<-rbind(aa_men,aa_women)
rm(aa_women,aa_men)
proportionbygender$region <- factor(proportionbygender$region, levels = c("black","Hispanic","Asian","nonresident","white"))
female=proportionbygender %>% 
  filter(sex=="female")
ggplot(female,aes(x=region,y=ratio)) +geom_bar( stat="identity",fill="#FF6666")+facet_wrap(~institution)+labs(x="Ethnicity",y= "Percentage(%)")+coord_flip()+theme_classic()+ggtitle("Percentage of Grants Total Given to Women of Each Ethnicity")

male=proportionbygender %>% 
  filter(sex=="male")
proportionbygender$region <- factor(proportionbygender$region, levels = c("black","Hispanic","Asian","nonresident","white"))
ggplot(male,aes(x=region,y=ratio)) +geom_bar( stat="identity",fill="#3366FF")+facet_wrap(~institution)+labs(x="Ethnicity",y= "Percentage(%)")+coord_flip()+theme_classic()+ggtitle("Percentage of Grants Total Given to Men of Each Race")

knitr::opts_chunk$set(echo = TRUE)
UCpHD_s_2010_20 <- read_csv("UCpHD's 2010-20.csv")
X10_20_UC_Faculty <- read_csv("10-20 UC Faculty.csv")
UCpHD_s_2010_20 %>%
  ggplot(aes(x = Gender, y = Awards, fill= Gender)) +
  geom_bar(stat = "identity") + scale_fill_manual(values = c("#72CDF4", "#005581")) + theme(text = element_text(size = 15)) +
  transition_time(year) +
  labs(title = paste("PhD Awards over 10 years by UC's in {frame_time}"), y = "PhD Awards", x = "Gender")

Rationale: This graph looks at the number of PhD students over 10 years by Gender at UC’s. Notice the pace of change for both men and women in ten years.

UCpHD_s_2010_20 %>%
  filter(Category == "STEM Major") %>%
  ggplot(aes(x = Gender, y = Awards, fill= Gender)) +
  geom_bar(stat = "identity") + scale_fill_manual(values = c("#FFB511", "#1295D8")) + theme(text = element_text(size = 15)) +
  transition_time(year) +
  labs(title = paste("STEM PhD Awards over 10 years by UC's in {frame_time}"), y = "PhD Awards", x = "Gender")

Rationale: This graph looks at the number of Female PhD STEM students compared with Males over 10 years. Notice the Lack of change over 4-5 years in the time period for women.

UCpHD_s_2010_20 %>%
  filter(Category == "STEM Major") %>%
  ggplot(aes(x = year, y = Awards, fill= Gender)) +
  geom_point(aes(color = Gender, size =8)) + scale_fill_manual(values = c("#72CDF4", "#005581")) + theme(text = element_text(size = 15)) +
  transition_time(year) +
  labs(title = paste("STEM PhD Awards over 10 years by UC's in {frame_time}"), y = "STEM PhD Awards", x = "Year")

Rationale: This graph was included to identify if there had been a change in the number of women admitted into STEM PhD programs by UC’s.The graph shows mostly no change in the amount of Females in PhD STEM programs. More male students in STEM PhD programs over 10 years.

X10_20_UC_Faculty %>%
  ggplot(aes(x = year, y = Professors, fill= Gender)) +
  geom_point(aes(color=Gender, size= 9)) + scale_fill_manual(values = c("#72CDF4", "#005581")) + theme(text = element_text(size = 15)) +
  transition_time(year) +
  labs(title = paste("Professors at UC's 10 years by Gender in {frame_time}"), y = "Total Professors", x = "Year")

Rationale: Tenured Professors make admissions decisions on who is accepted into PhD programs at UC’s. This graph was included to identify if there was a similar pattern by the Professors Admitting students into the program.The graph shows a similar pattern of no change in the amount of Female Tenured professors and more male professors in tenured positions over 10 years. Visual Encoding: The graph needed to match the Pattern of students in PhD programs over time, so I used the same colors and animation over time period.